Introduction
Topic and Motivation:
This analysis explores demographic, economic, and fiscal data to identify trends and patterns across U.S. states over time, focusing on variables such as age distributions, debt-to-income ratios, GDP, and state revenue sources. This work will set the stage for further analysis on how these factors correlate and vary by geography and time, providing context for evaluating economic health and demographic changes. Our primary focus is identifying causal relationships between our county level data and household debt-to-income ratios, controlling for numerous State level policies.

Data Description:
The dataset combines demographic data, state GDP, household debt-to-income ratios, and state fiscal revenue data. Key outcome variables include age distribution proportions, debt-to-income ratios, and state tax revenue by source. Data spans 2000–2023, aggregated at the state level, and has been cleaned and merged to facilitate consistent analysis.

Data Processing
Wrangling and Cleaning:
The data was imported from multiple CSV files, merged based on common fields (County/State FIPS and Year), and cleaned for consistency. Variables were standardized in terms of data types and units. Quarterly debt-to-income ratios were aggregated to annual data by taking the mean per county and year, and population age demographics were converted to proportions of the total population for each county in each year from the gross totals. We will likely end up logging most of the fiscal variables when we run regressions to account for long right tails in county level data (e.g. there are many more counties without major cities than there are with major cities, such as New York City).

Merging Datasets:
Demographic and GDP data were merged, followed by merging this combined dataset with debt-to-income data and fiscal revenue data. The final dataset reflects state-level statistics, allowing for longitudinal and categorical analyses.

Key Variables:
Our primary variables of interest are age distributions by category, debt-to-income ratios, real GDP, and revenue by tax source. Visualizations aim to illustrate trends, relationships, and distributions among these variables.

Debt-to-Income Ratios

Debt-to-Income Ratios:

This shows the distribution of household debt-to-income ratios over time, which will serve as our dependent variable in our modelling. The left plot shows the lower bound estimate from our dataset, and the right show the upperbound. They do not seem to be drastically different, but we include both for the time being and may utilize both in separate regressions to serve as a robustness check. Ex ante, the sharp rise and subsequent decline after 2011 seems to reflect the impacts of the post- 2008 recession recovery. For this and the subsequent graphics, we aggregated all of the counties for the time being to show the overall national trends, rather than having 51 State graphs overlapping (even more counties). Our modeling will focus on county level trends, however.

Age Group Distribution

Age Group Distribution:

This shows the proportion of the total population in each age catagory overtime. The largest visual pattern is the aging of the Baby Boomer generation, which is at current retiring in large numbers.

Real GDP Histogram

Real GDP Histogram:
This visualizes the county-level real GDP over the years. While the mean remains relatively stable, the individual observations move around quite a bit.

Tax Revenue Over Time

Tax Revenue Over Time:

Both the total state tax revenue data as well as its breakdown by major tax categories show a major spike after 2016/2017. This runs counter to what we would expect, with the SALT tax deductions being capped at $10,000 in 2018, which would incentivize states to lower their own taxes because their citizens can’t deduct as much from their federal taxes. This is something we may want to investigate as we do further analysis

Tax Revenue Over Time:

Tax Revenue Over Time:

This heat map shows the proportion of a State’s population that is calculated as “Retirement Aged.” This term is loose, but for the purposes of this analysis has been assigned to individuals age 65 and older. A notable piece of this visualization is that there do not appear to be any states with declining proportions of elderly demographics.